Skip to content

Conversation

@quge009
Copy link
Collaborator

@quge009 quge009 commented Oct 27, 2025

This PR is mainly about improving the user experience.

  • Changes made to optimize users perceived latency and reading experience:
    • Implement streaming output for LLMSession class, change the final answer generation call to streaming output, to post the answer to user as-soon-as the first few tokens are ready.
    • Implement the push_frontend method to leverage the steaming output to feedback the CoPilot progress status message to user in real-time, to manage users' experience during waiting for the answer.
    • Add auto scroll feature for frontend plugin to enhance readability.
  • Changes made to reduce the average_response_latency (defined as time between question receival and answer posting):
    • Refactor several components' (SmartHelp, LTP, ...) implementation into classes, to make it possible to preserve the states when necessary.
    • Reuse the same llm_session instance for requests within the same conversation, by avoiding unnecessary https re-connection in initialization.
    • Implement a new question parsing function to combine contextualization and classification llm calls into one efficient call, to reduce time.
    • Move prompt reading to instance initialization, to avoid unnecessary file I/O operations.
  • Also a minor bug fix is included:
    • Change the assignment of 'turnId' to frontend.

Effectiveness of this PR:

  • Impact on accuracy
    • No change
  • Impact on response latency
    • ~15% response time reduction on average
    • ~50% response time reduction for extreme simple question

@quge009 quge009 changed the title tmp Improve Performance: CoPilot, response latency, user expectation Oct 28, 2025
@quge009 quge009 changed the title Improve Performance: CoPilot, response latency, user expectation Improve Performance: CoPilot: response latency, user expectation Oct 28, 2025
@quge009 quge009 changed the title Improve Performance: CoPilot: response latency, user expectation Improve Performance: CoPilot: users' perceived response latency Oct 28, 2025
@quge009 quge009 marked this pull request as ready for review October 28, 2025 20:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the CoPilot chat agent to improve scalability and add streaming support. The main changes include:

  • Refactoring global singletons to instance-based LLM sessions: Removes global LLMSession() instances and passes them as parameters to avoid blocking in multi-user scenarios
  • Adding streaming support: Implements Server-Sent Events (SSE) streaming for real-time response delivery to the frontend
  • Improving thread safety: Adds locks for authentication state and introduces per-instance stream callbacks

Reviewed Changes

Copilot reviewed 26 out of 28 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/copilot-chat/src/copilot_agent/utils/llmsession.py Added streaming methods, per-instance callbacks, config caching, and thread safety improvements
src/copilot-chat/src/copilot_agent/utils/summary.py Updated to accept llm_session parameter instead of using global instance
src/copilot-chat/src/copilot_agent/utils/smart_help.py Refactored from function to class-based SmartHelp for better state management
src/copilot-chat/src/copilot_agent/ltp/ltp.py Converted from module-level functions to LTP class with instance-based session handling
src/copilot-chat/src/copilot_agent/copilot_service.py Added streaming endpoint and session management per user/conversation
src/copilot-chat/src/copilot_agent/copilot_conversation.py Updated to use per-conversation LLM sessions
src/copilot-chat/src/copilot_agent/utils/push_frontend.py New module for pushing events to frontend via streaming
contrib/copilot-plugin/src/app/ChatBox.tsx Frontend updated to consume SSE streaming responses
contrib/copilot-plugin/src/app/ChatHistory.tsx Enhanced auto-scroll behavior for streaming updates
Files not reviewed (1)
  • contrib/copilot-plugin/package-lock.json: Language not supported
Comments suppressed due to low confidence (3)

src/copilot-chat/src/copilot_agent/copilot_conversation.py:205

  • This comment appears to contain commented-out code.
        # try:
        #     push_frontend_meta(response_message_info)
        # except Exception:
        #     logger.debug('Failed to push early meta event for streaming client')

src/copilot-chat/src/copilot_agent/utils/dcw.py:123

    full_dcw = gen_dcw(user_prompt, map_existing)

src/copilot-chat/src/copilot_agent/copilot_turn.py:56

  • This statement is unreachable.
            question = this_inquiry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@quge009 quge009 changed the title Improve Performance: CoPilot: users' perceived response latency Improve Performance: CoPilot: users experience Oct 28, 2025
quge009 and others added 26 commits November 13, 2025 15:30
…on to make sure each thread uses its session
Copy link
Contributor

@hippogr hippogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a single comment, which is minor for you to review and resolve it. Besides of that, I guess we are good to go. ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants